Artificial intelligence-based classification of breast lesion from contrast enhanced mammography: a multicenter study

Purpose: The authors aimed to establish an artificial intelligence (AI)-based method for preoperative diagnosis of breast lesions from contrast enhanced mammography (CEM) and to explore its biological mechanism. Materials and methods: This retrospective study includes 1430 eligible patients who underwent CEM examination from June 2017 to July 2022 and were divided into a construction set (n=1101), an internal test set (n=196), and a pooled external test set (n=133). The AI model adopted RefineNet as a backbone network, and an attention sub-network, named convolutional block attention module (CBAM), was built upon the backbone for adaptive feature refinement. An XGBoost classifier was used to integrate the refined deep learning features with clinical characteristics to differentiate benign and malignant breast lesions. The authors further retrained the AI model to distinguish in situ and invasive carcinoma among breast cancer candidates. RNA-sequencing data from 12 patients were used to explore the underlying biological basis of the AI prediction. Results: The AI model achieved an area under the curve of 0.932 in diagnosing benign and malignant breast lesions in the pooled external test set, better than the best-performing deep learning model, radiomics model, and radiologists. Moreover, the AI model has also achieved satisfactory results (an area under the curve from 0.788 to 0.824) for the diagnosis of in situ and invasive carcinoma in the test sets. Further, the biological basis exploration revealed that the high-risk group was associated with the pathways such as extracellular matrix organization. Conclusions: The AI model based on CEM and clinical characteristics had good predictive performance in the diagnosis of breast lesions.


Introduction
Early diagnosis of breast cancer prior to metastasis allows for increased efficacy in the treatment and consequently leads to notable enhancements in survival rates.In addition, there are significant differences in treatment strategies and prognosis between the two types of breast cancer: in situ and invasive carcinoma.Because of the low incidence of axillary involvement in situ carcinoma (1-2%), sentinel lymph node biopsy (SLNB) is not recommended in planning breast conserving surgery [1] .However, in cases of invasive carcinoma, SLNB or axillary lymph node dissection (ALND) is necessary.Therefore, early diagnosis of in situ and invasive carcinoma can develop different surgical and treatment plans for patients.
Contrast enhanced mammography (CEM), as an emerging technology, is increasingly widely used in clinical applications since it can embody the vascularity of the lesions [2] and has high sensitivity similar to MRI in the diagnosis of breast cancer [3] .Nevertheless, CEM has unsatisfactory specificity (66-84%) [4,5] .Furthermore, the interpretation of traditional imaging examinations can also be influenced by the experience of radiologists because there is great variation among radiologists.In particular, based on these imaging examinations, in situ and invasive carcinoma cannot be well distinguished.Therefore, an automatic, reliable, and preoperative non-invasive way to differentiate benign and malignant breast lesions as well as to further differentiate in situ and invasive carcinoma is important.
A powerful artificial intelligence (AI) technology known as "deep learning" is gaining extensive attention for its excellent performance in image recognition tasks [6] .Although previous studies have applied deep learning to CEM images for the prediction of benign and malignant breast lesions [7,8] , the sample size of these studies was small, and they also lacked multicenter data to verify the generalization ability.Moreover, the value of applying deep learning to predict in situ and invasive carcinoma on CEM is unclear.
Based on the larger sample size CEM images, previous studies have adopted a deep learning model to differentiate benign and malignant breast lesions, demonstrating satisfactory results [9][10][11] .Base on this, we further explored the value of CEM images in differentiating in situ and invasive carcinoma in this study.In addition, despite its robust learning capabilities, deep learning lacks the biological interpretability of the learned deep learning features.Few studies have begun to focus on the underlying gene expression patterns of deep learning models [12,13] .However, to our knowledge, there is currently no research focusing on the biological basis of breast cancer prediction models in CEM images.
In this study, we aimed to develop an AI-based classification model that combined deep learning features from CEM images and clinical characteristics for preoperative diagnosis of benign and malignant breast lesions, as well as in situ and invasive carcinoma, and used multicenter data from four hospitals for testing.The biological basis under its prediction was also further explored.

Patients and datasets
Ethical approval for this study was provided by the Ethical Committee of our hospital on 21 November 2022 (Approval No.: 2022-303).The retrospective multicenter study was approved by our institutional review board and patient informed consent was waived.This work has been reported in line with the STARD, Supplemental Digital Content 1, http://links.lww.com/JS9/B719(Standards for the Reporting of Diagnostic accuracy studies) criteria [14] .Data from five centres from 2017 to 2022 were included.We enroled female patients who underwent CEM examination and histologically confirmed breast lesions from five centres.The patient inclusion and exclusion workflow are shown in Fig. 1

Lesion region segmentation
Lesion regions of interest were manually delineated by R1 on low-energy and recombined images of cranio-cadudal view using ITK-SNAP (version 3.6; www.itksnap.org).The radiologist was blinded to pathological data.The segmentation process is shown in Supplementary eFigure 1, Supplemental Digital Content 2, http://links.lww.com/JS9/B720.After 4 months, the images of 200 patients were randomly selected and segmented again by another two radiologists with 9 and 13 years of experience in reviewing breast screening (R2 and R3).Then, a dice similarity coefficient (DSC) [15] was calculated to assess the agreement of the image segmentation.

Model establishment
Figure 2A illustrates the detailed architecture of our proposed AI model.The AI model includes deep learning feature extraction and classification modules.The deep learning feature extraction module used RefineNet [16] with an encoder and a decoder as the backbone network to extract deep features.Then, a convolutional block attention module (CBAM) [17] was inserted into the last convolutional layer of the encoder for adaptive features refinement (Fig. 2B).The CBAM is a convolutional attention mechanism module, which can gradually focus on precise targets, namely, high-level semantics.The output by CBAM was applied to the global average pooling (GAP) layer to eliminate redundant features, which were refined deep learning features of the CEM images.
In the classification module, we used the XGBoost classifier to combine the refined deep learning features and clinical characteristics to collaboratively make a decision.The gold standard was pathology-proven based on specimen analysis from either a breast biopsy or surgery, and details can be found in the Supplementary eMethod 3, Supplemental Digital Content 2, http://links.lww.com/JS9/B720.The XGBoost output probability was regarded as the result of benign and malignant classification.To demonstrate the effectiveness of our AI model, ablation experiments were conducted.All details of the model architecture were provided in Supplementary eMethod 4, Supplemental Digital Content 2, http://links.lww.com/JS9/B720.The proposed AI model is also compared with traditional radiomics and clinical models.The detailed construction process of radiomics models is shown in Supplementary eMethod 7, Supplemental Digital Content 2, http://links.lww.com/JS9/B720.
In addition, the proposed AI model was retrained to distinguish in situ and invasive carcinoma among breast cancer candidates to further explore the diagnostic value of CEM images.

Readers study
Additional two radiologists (R4 and R5) with 7 and 13 years of experience in breast screening, who were not involved in other aspects of the study, assessed the benign and malignant nature of breast lesions in the internal and pooled external test sets, respectively.They were blinded to the histopathological data but be aware of the age and CEM images of each patient.The evaluation of benign and malignant breast lesions was based on the following aspects: breast composition, suspicious lesion type (mass, calcification, asymmetry, or architectural distortion), enhancement lesion type, internal enhancement pattern, background parenchymal enhancement, degree of lesion enhancement, and lesion diameter [18,19] .Then, we compared the performance of two radiologists with the AI model.
Each breast lesion was repeatedly diagnosed by radiologists with the AI model assistance to evaluate the assisting ability of the AI model for radiologists.There was no washout period.Finally, the performance of radiologists with the AI model assistance was compared with that of radiologists alone.

Biological basis exploration
To reveal the underlying biological basis of the AI prediction, gene analyses were performed based on the collected 12 patients with RNA-sequencing data.Twelve patients were divided into high-risk and low-risk subgroups of breast cancer according to AI prediction.The differentially expressed genes (DEGs) of the two risk groups were identified using R package DESeq2 according to the criteria of |log2 (Fold Change)| greater than 2 and adjusted P value less than 0.05.Subsequently, Kyoto Encyclopedia of Genes and Genomes (KEGG) and Gene Ontology (GO) analyses were performed using the R package clusterProfiler to identify the enriched pathways between high-and low-risk patients.

Model evaluation and statistical analysis
The area under the receiver operating characteristic (ROC) curve (AUC), area under the precision-recall curve (AUPRC), and confusion matrix was constructed to assess the performance of the model.The accuracy, sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV) were calculated from the ROC curve according to the maximizes of the Youden index, and the corresponding 95% CI was reported.In addition, a 95% sensitivity threshold was defined.Delong's test [20] was used to compare the statistical differences between different AUCs.We also evaluated the performance of the model in the different lesion diameter subgroups.All statistical analyses were implemented using R software (version 4.0.3;www.r-project.org) and Python (version 3.6.6).In clinical characteristics, descriptive statistics were summarized as mean standard deviation or frequencies and percentages.Continuous variables were used in the independent t-test or Manne-Whitney U test, and categorical variables were utilized in the Fisher's exact test or chi-square test to assess the differences between the patients in different groups.A two-sided P less than 0.05 was regarded as the statistically significant difference.Sample size was calculated from a previous pilot reader study [21] .

Clinical characteristics
In this study, 1297 patients from centre 1 were divided into a construction set (805 malignant lesions and 296 benign lesions, respectively) and an internal test set (154 malignant lesions and 42 benign lesions, respectively).The 133 patients from centre 2 to centre 5 with 30 benign lesions and 103 malignant lesions were used as a pooled external test set.The mean age was 51.54 11.61 years (range, 17-85 years old) in the construction set, 52.62 11.36 years (range, 17-80 years old) in the internal test set, and 51.23 10.42 years (range, 23-83 years old) in the pooled external test set, respectively.The histopathological type and clinical characteristics are described in Table 1.In addition, 37 of 773 (5%) participants with breast cancer in the construction set, 6 of 148 (4%) in the internal test set, and 10 of 103 (10%) in the pooled external test set were in situ carcinomas.The rest were invasive carcinoma.No statistical differences in clinical characteristics were identified between patients with in situ carcinoma and patients with invasive carcinoma in all datasets (Supplementary eTable 1, Supplemental Digital Content 2, http:// links.lww.com/JS9/B720).

Segmentation similarity
The average DSC of R1, R2, and R3 was 0.90, of which 0.87 for benign lesions and 0.93 for malignant lesions.R1 and R2 had an average DSC of 0.89.R2 and R3 had an average DSC of 0.93.R1 and R3 had an average DSC of 0.88.
For differentiation between in situ and invasive carcinoma, our AI model showed AUCs of 0.824 and 0.788 in the internal test set and pooled external test set; an accuracy of 78.4% and 75.7%; a sensitivity of 83.3% and 70.0%; and a specificity of 78.2% and 76.3% (Fig. 3D and eTable 2, Supplemental Digital Content 2, http://links.lww.com/JS9/B720).
The confusion matrices of the AI model for the diagnosis of breast lesions in two test sets are shown in Supplementary eFigure 4, Supplemental Digital Content 2, http://links.lww.com/JS9/B720.Figure 2C shows the comparison between the input image and the generated image obtained through the AI model, which indicated that the extracted deep learning features contain the inherent features necessary to characterize breast lesions.The accuracy and loss curves of Supplementary eFigure 5, Supplemental Digital Content 2, http://links.lww.com/JS9/B720show the particle process of the construction set.The figure indicates that the model is well-converged and has no overfitting.
In addition, we analyzed the wrong cases of the AI model.In the case of Fig. 4A, higher gland density on the low-energy image results in the obscuration of the tumour boundary.In the case of Fig. 4B, heterogeneous enhancement of the lesion and ambiguous boundary lead to the misidentification of AI.AI model has higher error rates for lesions with ambiguous boundaries.

Performance of radiologists and radiologists with the AI model assistance
The performance of the AI model was compared with radiologists (R4 and R5) in the two test sets.The specificity and sensitivity points for radiologists' performance on the internal and pooled external test sets were plotted in the ROC space, as shown in Fig. 5A and B. The figure shows that the points of radiologists lie below the ROC curve of the AI model.Performance metrics are reported in Table 3.In addition, our model spent 1.9 and 1.2 h from segmentation to analyzing imaging on the internal and pooled external test sets, respectively, less time than that of radiologists (Supplementary eTable 6, Supplemental Digital Content 2, http://links.lww.com/JS9/B720).
Table 4 illustrates the diagnostic performance of the radiologists with the help of the AI model.Figure 5A and B show that the performance of radiologists with the AI model assistance surpassed our AI model.

Performance of the AI model in different lesion diameter subgroups
We tested the performance of the AI model on the different lesion diameter subgroups in the test sets.Supplementary eFigure 7, Supplemental Digital Content 2, http://links.lww.com/JS9/B720demonstrates that the AUCs of the AI model were 0.811, 0.886, and 0.943 for lesion diameters of less than or equal to 1 cm, 1-2 cm, and greater than or equal to 2 cm subgroups in the internal test set and 1.000, 0.920 and 0.938 in the pooled external test set.

Biological basis exploration
Considering that all breast cancer samples for transcriptome sequencing were invasive carcinomas, we only explored the biological basis in the primary diagnosis model.The heatmap of the genes expression in 8 high-risk patients and 4 low-risk patients is presented in Fig. 6A, demonstrating significant differences between the two risk groups.Simultaneously, the differentially expressed genes related to breast cancer, such as FHL1, GPM6B, RELN, and CXCL10 were discovered between the two risk groups (Fig. 6B).The KEGG and GO analyses based on the DEGs identified the key biological pathway, as shown in Fig. 6C and D.
KEGG analysis inferred that several pathways such as ECM − receptor interaction, focal adhesion, and human papillomavirus infection were significantly upregulated in high-risk patients.From the enriched pathways based on the GO analysis, we found that these pathways were mainly enriched in the extracellular matrix organization and structure organization.

Discussion
In this multicenter study, we developed and tested an AI model combining deep learning features and clinical characteristics for the preoperative diagnosis of breast lesions on CEM.Compared with other models, the AI model demonstrated the best prediction performance, with an AUC of 0.932 on the pooled external test set.Notably, an association analysis of AI prediction with RNAsequencing was performed to reveal the underlying biological basis of the model.To the best of our knowledge, this is the first study to distinguish in situ and invasive carcinoma based on AI on CEM images and explore the biological basis of an AI model.Several studies have shown that radiomics can predict benign and malignant breast lesions [5,[22][23][24] .However, the features extracted by this method are manually designed and have great instability.The deep learning method can not only automatically extract high-throughput features, avoiding the bias of manual design features in radiomics; and it can achieve superior performance compared with traditional methods with the increase in the amount of training data [25] .In this study, we also demonstrated that the AI model surpassed radiomics models.Perek et al. [7] first presented a deep learning decision support system based CEM images to improve the specificity (66%) of breast cancer diagnosis without affecting sensitivity, but only 129 patients were included.Song et al. [8] proposed a multiview multimodal network for breast cancer diagnosis on CEM, wherein only 95 patients were enroled.Dominique et al. [26] presented a deep learning model that effectively characterizes breast malignant lesions, which demonstrated a good performance to characterize oestrogen receptor status and to differentiate triple-negative breast cancers.However, no external verification was conducted.These studies may not meet the actual requirements because the large amount and multicenter data were not yet available.Thus, comparing with previous similar studies, we collected a larger data set and used multicenter data from various regions of China (east, west, south, and north) to evaluate the generalization and application capabilities of our model.Many studies have also begun to focus on the prediction of breast ductal carcinoma in situ using MRI or ultrasound [27,28] , but the differential value of CEM images in distinguishing in situ carcinoma from invasive carcinoma is still unclear.In the present study, we have found that CEM images had a good performance for the discrimination of in situ and invasive carcinoma based on AI model.
Recently, many studies has achieved good results in deep learning-based diagnosis of benign and malignant breast lesions with CEM images [9][10][11]29] . In cntrast, in this study, we proposed an AI model that can not only distinguish benign and malignant breast lesions but also differentiate in situ and invasive carcinoma.In addition, we incorporated data from more centres for model generalization ability testing.Notably, before applying AI models based on imaging in clinical settings, it is crucial to elucidate the underlying biological basis behind them [30] .To move forward, we employed gene analysis of RNA-sequencing data to uncover the biological foundation of the AI model.The results showed that the high-risk phenotypes were linked to the tumour proliferation pathway such as ECM − receptor interaction, extracellular matrix organization, and focal adhesion.The extracellular matrix could influence the invasion of tumour cells through different mechanical signal transduction pathways [31] .A core component of the focal adhesion pathway is focal adhesion kinase, which can regulate nucleostemin levels, a nucleolar protein involved in promoting breast tumour growth [32] .This is consistent with our findings.
We defined the cutoff point maximizing the Youden index and the 95% sensitivity threshold.The cutoff point maximizing the Youden index balanced the sensitivity and specificity of the model, resulting in the highest accuracy (85-87%) in diagnosing benign and malignant lesions.The 95% sensitivity threshold could achieve an NPV of 73-76%, thus the AI model could prevent patients with benign breast lesions from undergoing unnecessary biopsies on the premise of avoiding missed diagnosis.
Our study also has several limitations.First, we included a sizeable and multicenter CEM dataset but did not include prospective and international datasets, and also excluded non-mass lesions.We will focus on collecting a larger variety of lesion types and more cross-regional and cross-country datasets in future prospective studies to further improve the generalization and application capabilities of the model.Second, although the dice similarity coefficient achieved good agreement, the segmentation of the lesions was manually drawn by the radiologists.Third, the sample size of benign and malignant lesions in the dataset is unbalanced.Although the class weight method was used to mitigate this problem, the class-imbalanced problem may not be completely solved.Fourth, background parenchymal enhancement may impact the model performance in small lesions.Fifth, the study only included low-energy and recombined images of the cranio-cadudal view.Mediolateral-oblique view, high-energy images, and other modal images will also be included in the analysis to add more image information.Lastly, we only collected RNA-sequencing data from 12 cases and will continue to expand the sample size for further exploration.
In conclusion, the AI model we proposed that integrated refined deep learning features and clinical characteristics can efficiently and noninvasively identify benign and malignant breast lesions from CEM, and distinguish in situ from invasive carcinoma.Further prospective and different countries verifications in the future will provide additional evidence for our model to assist clinical decision-making.

Ethical approval
Ethical approval for this study was provided by the Ethical Committee of Yantai Yuhuangding.Hospital on 21 November 2022 (Approval No.: 2022-303).

Consent
The retrospective multicenter study was approved by the Yantai Yuhuangding Hospital institutional review board and patient informed consent was waived.

Figure 1 .
Figure 1.The patient inclusion and exclusion workflow.CEM, contrast enhanced mammography.

Figure 2 .
Figure 2. Architecture of our artificial intelligence (AI) model for contrast enhanced mammography (CEM) breast lesions classification.(A) The model includes deep learning feature extraction and classification modules.The deep feature extraction module used RefineNet as the backbone network, and the convolutional block attention module (CBAM) was inserted into the last convolutional layer of the encoder network.The output of CBAM was applied to the global average pooling (GAP) layer to generate refined deep learning features of the CEM images.The classification module used the XGBoost classifier to fuse the refined deep learning features and clinical characteristics for preoperative diagnosis of breast lesions.(B) Network structure of the CBAM attention module.The input feature was sequentially applied to the channel and spatial attention modules to obtain a refined feature.(C) The deep feature extraction module was used to input and generate images.

Figure 3 .
Figure 3. Receiver operating characteristic (ROC) curves of the different models for primary diagnosis of benign and malignant of breast lesions in the construction set (A), internal test set (B), and pooled external test set (C); (D) ROC curves of our AI model for diagnosis of in situ and invasive carcinoma among breast cancer candidates in the construction set, internal test set, and pooled external test set.AI, artificial intelligence; AUC, area under the receiver operating characteristic curve; CBAM, convolutional block attention module; LR, logistic regression.

Figure 4 .
Figure 4. Examples of errors made by the artificial intelligence model.(A) Images in a 56-year-old woman with invasive ductal carcinoma.contrast enhanced mammography (CEM) images show a 0.8 cm mass, BI-RADS 4B.(B) Images in a 56-year-old woman with intraductal papilloma.CEM images show a 1.5 cm mass, BI-RADS 4C.

Figure 5 .
Figure 5. Receiver operating characteristic (ROC) curves of our AI model for diagnosis of benign and malignant of breast lesions and points (specificity and sensitivity) of radiologists in the internal test set (A) and in the pooled external test set (B). AI, artificial intelligence; AUC, area under the ROC curve.

Figure 6 .
Figure 6.The genetic analysis for exploring the underlying biological basis of the artificial intelligence (AI) model.(A) Heatmap illustrating gene expressions profiles for the 12 breast cancer cases; (B) Volcano diagram of differentially gene expression between risk groups predicted by AI model.The red dots represent genes upregulated in high-risk patients, and the blue dots represent genes upregulated in in low-risk patients; (C) Bubble plot of the top 15 enriched pathways of the differentially expressed gene sets identified by Gene Ontology (GO) analysis, ordered by odds ratio; (D) Upregulated and downregulated pathways in high-risk group based on Kyoto Encyclopedia of Genes and Genomes (KEGG) analysis.
This study was supported by the National Natural Science Foundation of China (82371933, 82001775 and 62176140), Taishan Scholar Foundation of Shandong Province of China (tsqn202211378), Natural Science Foundation of Shandong Province of China (ZR2021MH120) and Special Fund for Breast Disease Research of Shandong Medical Association (YXH2021ZX055) HIGHLIGHTS • The artificial intelligence (AI) model combining attentionbased deep learning features and clinical characteristics can diagnosis of benign and malignant breast lesions, as well as in situ and invasive carcinoma and was superior to other deep learning, and radiomics models in the diagnosis of breast lesions.

Table 1
Histopathological type and clinical characteristics for 1430 patients in our dataset.

Table 2
Performance of the different models according to construction set, internal and pooled external test sets.

Table 3
Comparisons the performance of the AI model and radiologists for breast lesions identification on internal and pooled external test sets.

Table 4
Performance of radiologists with the artificial intelligence (AI) model assistance in internal and pooled external test sets.